HMM-based stressed speech modeling with application to improved synthesis and recognition of isolated speech under stress

نویسندگان

  • Sahar E. Bou-Ghazale
  • John H. L. Hansen
چکیده

In this study, a novel approach is proposed for modeling speech parameter variations between neutral and stressed conditions and employed in a technique for stressed speech synthesis and recognition. The proposed method consists of modeling the variations in pitch contour, voiced speech duration, and average spectral structure using hidden Markov models (HMM’s). While HMM’s have traditionally been used for recognition applications, here they are employed to statistically model characteristics needed for generating pitch contour and spectral perturbation contour patterns to modify the speaking style of isolated neutral words. The proposed HMM models are both speaker and word-independent, but unique to each speaking style. While the modeling scheme is applicable to a variety of stress and emotional speaking styles, the evaluations presented in this study focus on angry speech, the Lombard effect, and loud spoken speech in three areas. First, formal subjective listener evaluations of the modified speech confirm the HMM’s ability to capture the parameter variations under stressed conditions. Second, an objective evaluation using a separately formulated stress classifier is employed to assess the presence of stress imparted on the synthetic speech. Finally, the stressed speech is also used for training and shown to measurably improve the performance of an HMM-based stressed speech recognizer.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Synthesis of stressed speech from isolated neutral speech using HMM-based models

In this study, a novel approach is proposed for modeling speech parameter variations between neutral and stressed conditions and employed in a technique for stressed speech synthesis. The proposed method consists of modeling the variations in pitch contour, voiced speech duration, and average spectral structure using Hidden Markov Models (HMMs). While HMMs have traditionally been used for recog...

متن کامل

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...

متن کامل

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...

متن کامل

Duration and spectral based stress token generation for HMM speech recognition under stress

I n this paper, we address the problem of isolated word recognition of speech under various stressed speaking conditions. The niain objective is to formulate an alternate training algorithm for hidden Markov model recognition, which better characterizes actual speech production under stressed speaking styles such as slow, loud and Lombard effect, without the need for collecting such stressed sp...

متن کامل

Generating stressed speech from neutral speech using a mod CELP vocoder ’ ified Sahar

The problem of speech modeling for generating stressed speech using a source generator framework is addressed in this paper. In general, stress in this context refers to emotional or task induced speaking conditions. Throughout this particular study, the focus will be limited to speech under angry, loud and Lombard effect (i.e., speech produced in noise) speaking conditions. Source generator th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Speech and Audio Processing

دوره 6  شماره 

صفحات  -

تاریخ انتشار 1998